Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

نویسندگان

  • William L. Cooper
  • Bharath Rangarajan
چکیده

We consider Markov decision processes with unknown transition probabilities and unknown single-period expected cost functions, and we study a method for estimating these quantities from historical or simulated data. The method requires knowledge of the system equations that govern state transitions as well as the single-period cost functions (but not the single-period expected cost functions). The estimation procedure is based upon taking expectations with respect to the empirical distribution functions of such data. Once the estimates are in place, the method computes a policy by solving the obtained “empirical” Markov decision process as if the estimates were correct. For MDPs that satisfy some conditions, we provide explicit, easily computed expressions for the probability that the procedure will produce a policy whose true expected cost is within any specified absolute distance of the actual optimal expected cost of the true Markov decision process. We also provide expressions for the number of historical or simulated data values that is sufficient for the procedure to produce a policy whose true expected cost is, with a prescribed probability, within a prescribed absolute distance of the actual optimal expected cost of the true Markov decision process. We apply our results to multiperiod inventory models. In addition, we provide a specialized analysis of such inventory models that also yields relative, rather than absolute, accuracy guarantees. We make comparisons with related results that have recently appeared, and we provide numerical examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimality Inequalities for Average Cost Markov Decision Processes and the Stochastic Cash Balance Problem

For general state and action space Markov decision processes, we present sufficient conditions for the existence of solutions of the average cost optimality inequalities. These conditions also imply the convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time case. Inventory models are natural applications of ou...

متن کامل

Revenue Management for Parallel Flights with Customer-Choice Behavior

We consider the simultaneous seat-inventory control of a set of parallel flights between a common origin and destination with dynamic customer choice among the flights. We formulate the problem as an extension of the classic multiperiod, single-flight “block demand” revenue management model. The resulting Markov decision process is quite complex, owing to its multidimensional state space and th...

متن کامل

An Integrated Queuing Model for Site Selection and Inventory Storage Planning of a Distribution Center with Customer Loss Consideration

    Discrete facility location,   Distribution center,   Logistics,   Inventory policy,   Queueing theory,   Markov processes, The distribution center location problem is a crucial question for logistics decision makers. The optimization of these decisions needs careful attention to the fixed facility costs, inventory costs, transportation costs and customer responsiveness. In this paper we stu...

متن کامل

A Variance Analysis for POMDP Policy Evaluation

Partially Observable Markov Decision Processes have been studied widely as a model for decision making under uncertainty, and a number of methods have been developed to find the solutions for such processes. Such studies often involve calculation of the value function of a specific policy, given a model of the transition and observation probabilities, and the reward. These models can be learned...

متن کامل

17 Water Reservoir Applications of Markov Decision Processes

Decision problems in water resources management are usually stochastic, dynamic and multidimensional. MDP models have been used since the early fties for the planning and operation of reservoir systems because the natural water in ows can be modeled using Markovian stochastic processes and the transition equations of mass conservation for the reservoir storages are akin to those found in invent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Operations Research

دوره 60  شماره 

صفحات  -

تاریخ انتشار 2012